install.packages("Haver")
Utilizing Haver Analytic Data in R
In this tutorial, we will explore the integration and utilization of Haver Analytics data within R. This is done utilizing the Haver Library, instead of IMF Datatools, which we previously explored.
Pre-requisites
Loading additional packages
To begin, we need to ensure our Haver package is installed and correctly loaded into our environment. The package should have already been installed during the set up portion at the beginning of the book, but if you are jumping to this section, use the following command in your R console to install the Haver package.
Next, use this command to load the Haver package.
library(Haver)
Using DLXDB environment variable for setting the Haver path.
Haver path set to: \\imfdata\econ\DATA\DLX\DATA\
Restoring default Haver query limits.
Before accessing any Haver databases, we need to configure the path to our Haver data directory. This step is crucial for R to locate and interact with the data.
haver.path("\\\\imfdata\\econ\\data\\dlx\\data\\")
Haver path set to: \\imfdata\econ\DATA\DLX\DATA\
You should now be set up to use Haver data directly in R.
Accessing Haver Analytics Data
Retrieving Data from Haver
The haver.data function allows us to retrieve data from a specific Haver database. Let’s look at retrieving annual debt to GDP data for different countries from the “EMERGLA” database and view the tail end of the data.
#Input the desired series codes
<- c('A312FDPG','A311FDPG','A213FDP','A223FDP','A321FDPG','A243FDP')
vars
#Specify the database and frequency
<- haver.data(codes = vars, database = "emergela", frequency = "annual")
mydata
#View the tail end of the data
tail(mydata)
a312fdpg a311fdpg a213fdp a223fdp a321fdpg a243fdp
2019 48.49 74.57 89.60 74.44 78.04 40.4
2020 68.29 96.89 103.66 86.94 109.10 56.6
2021 55.13 90.13 80.65 77.31 109.19 50.4
2022 33.52 79.56 84.95 71.68 105.09 45.5
2023 NA NA 156.27 73.83 NA 45.1
2024 NA NA 83.07 76.50 NA 46.3
You can also retrieve an entire Haver database, instead of desired series codes, but this takes quite a bit of time to load. This option is provided to you below, but it is an optional command.
<- as.data.frame(haver.metadata( database = "USECON" )) hv_metadata_df
Retrieving Metadata from Haver
It is also important to understand the metadata of the variables we are working with.
Let’s use the haver.metadata function to retrieve this information and also list the description of the variables we are using to double check this is what we want.
<- haver.metadata(codes = vars, database = "emergela")
mymeta <- as.data.frame(mymeta)
mymetadf <- mymetadf[, c("code", "descriptor")]
mydesc tail(mydesc)
code descriptor
1 a312fdpg Anguilla: Public Sector Debt to GDP (%)
2 a311fdpg Antigua and Barbuda: Public Sector Debt to GDP (%)
3 a213fdp Argentina: Gross Public Debt as a % of GDP (%)
4 a223fdp Brazil: Gross General Government Debt as a % of GDP (%)
5 a321fdpg The Commonwealth of Dominica: Public Sector Debt to GDP (%)
6 a243fdp Dominican Republic: Public Debt as % of GDP (%)
Retrieving Data for a specific period
What if we wanted to retrieve data only for a specific period?
Let’s retrieve a higher frequency indicator, Argentina’s and Mexico’s currency in circulation, and specify the period and format we want to retrieve.
#Specifying the series codes and respective database
<- haver.data(codes=c("n213fmtc","c273fmce"),
currency database = "EMERGELA", freq="q",
start=as.Date("2010-01-01", format="%Y-%m-%d"))
head(currency)
n213fmtc c273fmce
2010-Q1 94302.10 597193.9
2010-Q2 95156.17 577815.5
2010-Q3 104835.63 588091.8
2010-Q4 114378.90 693423.1
2011-Q1 127734.27 634711.8
2011-Q2 132877.80 635323.3
In the start
argument, the as.Date()
function turns the string "2010-01-01"
into a date that R can understand. The format
part tells R how the date is written (in this case, the order is year-month-day). This helps R know exactly what date to use when pulling the data, as we can see in the top portion visualized.
For clarity, we should rename the columns with the names of our series codes.
#Renaming series codes to descriptors
colnames(currency)[colnames(currency) == "n213fmtc"] <- "Arg_Curr_in_Circ"
colnames(currency)[colnames(currency) == "c273fmce"] <- "Mex_Curr_in_Circ"
Lets quickly check again that the beginning of the retrieval is for the correct period and that the series have been renamed correctly.
head(currency)
Arg_Curr_in_Circ Mex_Curr_in_Circ
2010-Q1 94302.10 597193.9
2010-Q2 95156.17 577815.5
2010-Q3 104835.63 588091.8
2010-Q4 114378.90 693423.1
2011-Q1 127734.27 634711.8
2011-Q2 132877.80 635323.3
Data Formats and Aggregations Methods
Setting up the Data for Analysis
When downloading data from Haver Analytics, you can specify the format in which you retrieve the data (e.g., zoo, xts) and control how the data is aggregated (e.g., end of period, period average). This flexibility is helpful when working with different time series structures and performing specific analyses.
Retrieving Data in Different Formats
Haver allows you to download data in formats like a plain data frame, or zoo or xts, which are well-suited for time series analysis.
Downloading Data as a Plain Data Frame
The as.data.frame function allows you to retrieve data in a basic data frame format, without any time series-specific structure. This can be helpful for tasks where time series functionality isn’t necessary.
# Downloading data as a plain data frame
<- as.data.frame(haver.data(codes = "LR", database = "USECON", freq = "m"))
lr_us_df
# Display the first few rows of the data frame
head(lr_us_df)
lr
1948-Jan 3.4
1948-Feb 3.8
1948-Mar 4.0
1948-Apr 3.9
1948-May 3.5
1948-Jun 3.6
Downloading Data as a zoo Object
The haver.as.zoo()
function converts Haver data into a zoo object, which is a common format for time series data. You would choose zoo for general time series tasks, especially when dealing with irregular or missing data. This is how you would perform this conversion.
#Load the xts and zoo library
library(xts)
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(zoo)
# Convert the Haver data to a zoo object
<- haver.as.zoo(haver.data(codes = "LR", database = "USECON", freq = "m"))
data_zoo
# Display the first few rows of the zoo object
head(data_zoo)
lr
1948-01-31 3.4
1948-02-29 3.8
1948-03-31 4.0
1948-04-30 3.9
1948-05-31 3.5
1948-06-30 3.6
Downloading Data as an xts Object
Xts is preferred for financial or economic data that require advanced time series handling, particularly with regular intervals like daily or monthly data. If you prefer working with the xts format, which extends the functionality of zoo, you can convert the data as follows:
# Convert the zoo object to an xts object
<- as.xts(data_zoo)
data_xts # Display the first few rows of the xts object
head(data_xts)
lr
1948-01-31 3.4
1948-02-29 3.8
1948-03-31 4.0
1948-04-30 3.9
1948-05-31 3.5
1948-06-30 3.6
We can also use the xts function to directly change a zoo object into an xts object, like in the example below using our previously downloaded currency data.
#Convert Haver data
<- as.xts(haver.as.zoo(currency)) currency
Lastly, sometimes, there are some NA values. This is the command you can use to make sure those are cleaned out.
# Remove NA values
<- na.omit(currency) currency
Specifying Aggregation Methods
When downloading data from Haver, you may need to aggregate the values differently depending on your analysis requirements. For example, you may want to retrieve data based on the end of the period, period average, or perform more complex aggregations like summing monthly data into quarterly values.
Haver allows you to control these behaviors using the aggmode
parameter. The key modes are:
Strict Mode: Aggregate only if all data points for the period are available.
Relaxed Mode: Aggregate if at least one data point for the period is available.
Force Mode: Always aggregate, even if data points are missing.
Downloading Data Using End of Period Values
The end-of-period aggregation is useful when you’re interested in the final value of a time period, such as the last day of a month or the last quarter of the year. This can be applied when you need to understand the status of a variable at the period’s conclusion.
To get the end-of-period values, use aggmode = "strict"
. This ensures that Haver will only aggregate if all the data points for the period are available, ensuring data integrity.
Let’s look at this in the case of the U.S. Federal Funds rate.
# Retrieve data with end-of-period aggregation for U.S. Federal Funds Rate
<- haver.data(codes = "FFED", database = "USECON", freq = "q", aggmode = "strict")
ffr_end_period
# Display the first few rows
tail(ffr_end_period)
ffed
2024-Q1 5.330000
2024-Q2 5.330000
2024-Q3 5.263333
2024-Q4 4.650000
2025-Q1 4.330000
2025-Q2 NA
Downloading Data Using Period Averages
For situations where you’re interested in the average value over a period (e.g., monthly data averaged over a quarter), you can use period averages. This is helpful when you’re looking to smooth out volatility or report aggregate trends across a time period.
To download period average data, use aggmode = "relaxed"
. This will aggregate the data as long as at least one data point for the period is available. Let’s look at how this would work for U.S. consumer price index (inflation).
# Retrieve data with period average aggregation for U.S. CPI
<- haver.data(codes = "PCUN", database = "USECON", freq = "q", aggmode = "relaxed")
cpi_us_avg
# Display the first few rows
tail(cpi_us_avg)
pcun
2024-Q1 310.3583
2024-Q2 313.9307
2024-Q3 314.8790
2024-Q4 315.5873
2025-Q1 318.8507
2025-Q2 NA
Downloading Data Using Forced Aggregation (Sum)
In some cases, you may need to sum data over a period, such as when summing monthly GDP to calculate quarterly GDP. This is particularly useful for series where the sum over a period provides more insight than the average or end value.
To force sum aggregation, use aggmode = "force"
. This ensures that data is aggregated even if some data points for the period are missing.
# Retrieve data with forced sum aggregation (e.g., summing monthly data to quarterly GDP)
<- haver.data(codes = "GDP", database = "USECON", freq = "q", aggmode = "force")
gdp_sum
# Display the first few rows
head(gdp_sum)
gdp
1947-Q1 243.2
1947-Q2 246.0
1947-Q3 249.6
1947-Q4 259.7
1948-Q1 265.7
1948-Q2 272.6
Example: U.S. Unemployment Rate
Now that we have learned to retrieve data and clean it, let’s retrieve U.S. unemployment rate from Haver and visualize it in a simple line graph.
Retrieving U.S. Unemployment Rate
We’ll use the Haver data function to retrieve U.S. unemployment from the “USECON” database. The series code for this data is “LR”.
# Specifying the series code for U.S. CPI percent change from the USECON database
<- haver.data(codes = "LR", database = "USECON", freq = "m")
lr_us
# Display the first few rows of the retrieved data
head(lr_us)
lr
1948-Jan 3.4
1948-Feb 3.8
1948-Mar 4.0
1948-Apr 3.9
1948-May 3.5
1948-Jun 3.6
Converting Data to a Time Series (zoo)
We must convert the retrieved data into a zoo object, like was previously shown, which is well-suited for time series analysis.
# Load the necessary libraries for time series data
library(xts)
library(zoo)
# Convert the Haver data into a zoo object for time series handling
<- haver.as.zoo(lr_us)
lr_us_zoo
# Display the first few rows of the converted zoo object
head(lr_us_zoo)
lr
1948-01-31 3.4
1948-02-29 3.8
1948-03-31 4.0
1948-04-30 3.9
1948-05-31 3.5
1948-06-30 3.6
Plotting U.S. Unemployment Rate
We can now directly plot the zoo object using the autoplot function from ggplot2. This is a function that is designed to work with zoo objects directly, simplifying the plotting process.
# Load the ggplot2 and zoo autoplot function
library(ggplot2)
# Directly plot the zoo object using autoplot.zoo
autoplot(lr_us_zoo) +
labs(title = "U.S. Unemployment Rate (Monthly)",
x = "Time",
y = "Unemployment Rate (%)") +
theme_minimal()
In this tutorial, we explored how to effectively utilize Haver Analytics data in R, from data retrieval to formatting, aggregation, and visualization.